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Abstract 

In this paper we propose a novel form of clipping mitigation in OFDM using compressive sensing 
that completely avoids tone reservation and hence rate loss for this purpose. The method builds on 
selecting the most reliable perturbations from the constellation lattice upon decoding at the receiver, 
and performs compressive sensing over these observations in order to completely recover the temporally 
sparse nonlinear distortion. As such, the method provides a unique practical solution to the problem of 
initial erroneous decoding decisions in iterative ML methods, offering both the ability to augment these 
techniques and to solely recover the distorted signal in one shot. 



I. Introduction 

Multicarrier signalling schemes such as Orthogonal Frequency Division Multiplexing (OFDM) have an 
inherent sensitivity to nonlinear distortion at all stages of the transmission process. To obtain information 
about the nonlinear temporal distortion in an OFDM signal, the majority of receiver-based mitigation 
techniques begin with observing the deviation of the equalized frequency domain variables from the 
discrete symbol constellation. As useful as this may be, a valid inconsistency is always persistently 
present. After all, it is the position of those very symbols in the frequency domain that ultimately entitle 
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our decoding decisions, and should any of those symbols be perturbed outside their correct decision 
boundaries by nonlinear distortion, it will always be the case that any further reliance on these erroneous 
measurements might be resistent to further correction. Furthermore, refraining from using part of the 
deviations in recovering the distortion reduces the effectiveness of the mitigating algorithm. 

Our major contributions are then to first suggest algorithms that can use a subset of the deviations 
in the frequency domain to dually avoid erroneous decisions and recover from the distortion with no 
theoretical sacrifice of given information and thus performance, and secondly to tailer the input model 
to these algorithms by selecting the most appropriate set of observations using a simplified procedure 
that models an actual Bayesian reliability measure. Although many scenarios and modifications apply to 
the methods herein, due to the limited space and the ongoing development of the presented concepts, we 
will restrict our discussion to mitigating distortion caused by clipping at the transmitter, and delay more 
elaborate applications to a further treatment. 

Unless otherwise noted, frequency domain variables will be represented by uppercase italic letters 
while lower case letters will be reserved for time domain variables. The lower index in X{ will denote 
the i th constellation point amongst an M-ary alphabet X while Ai(k) will be used for the k th scalar 
coefficient of the the i th column vector Ai of matrix A. Furthermore, (X(k)) will denote a hard decoding 
operation which maps X(k) back into X . The standard notation of Xi-N will be be used for the i th order 
statistic in a sample of N random variables of a common probability density function (I). Finally, we 
use F for Cumulative Distribution Functions (CDF) and F for unitary Fourier matrices. 

II. Transmission and Clipping Model 

In an OFDM system, Serially incoming bits are mapped into an M-ary QAM alphabet {Xq, X\, . . . , Xm-i} 
and concatenated to form an N dimensional data vector X = [X(0)X(1) • • • X(N — 1)] T . The time- 
domain signal is obtained by an IFFT operation so that x = F H X where 



and L is an oversampling factor. Since x has a high peak to average power ratio (PAPR), the digital 
samples are subject to a magnitude limiter which saturates its operands to a value of 7, and hence instead 
of feeding x to the power amplifier, we feed x where 



F k (£) = N 



/2 e -j2*kt/LN t Me0 ,l,...,LiV-l. 




(1) 
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where 9 x u\ is the phase of x{i). This soft limiting operation can be conveniently thought of as adding 
a peak-reducing signal c to x whereby its low-PAPR counterpart x = x + c is transmitted instead, and 
whereby x can be re-generated at the receiver by estimating c. What's more, by setting a typical clipping 
threshold 7 on x, c is controllably sparse in time by the impulsive nature of x, and dense in frequency 
by the uncertainty principle. We will denote its temporal support by Z c = {n : c(n) / 0} and always 
maintain the practical assumption that |Z C | <C N. 

In the frequency domain, this translates to transmitting X = X + C, with complex coefficients that are 
now randomly pre-perturbed from the lattice X, followed by additional random post-perturbations by the 
channel H=F^AF and additive noise samples Z ~ CJ\f(0,azlNxN) at the receiver, where the circulant 
channel H has been decomposed as such by virtue of the added cyclic prefix in OFDM signalling. At 
the receiver, this reads 

Y = AX + Z, (2) 

where we will make the practical assumption that the channel coefficients are known on its side. 
Consequently, X can be directly recovered scalar-wise from Y, i.e. 

X(k) = K k \k)Y{k) 

= X(k) + C(k) + A k \k)Z(k). (3) 

Let D(k) = C(k) + A k , 1 (k)Z(k) denote the general distortion on the frequency domain sample A(fe)Q 
A naive ML decoder will now simply map X{k) to the nearest constellation point Xj,* to recover X{k), 
where i*(k) = argminj \X(k) — Xi(k)\, treating the clipping distortion as additive noise. Although such a 
hard-decoding scheme is very efficient at high SNR in the classical AWGN scenario, the clipping scenario, 
however, introduces another 7-dependent source of perturbation which is immune to any increase in SNR. 

An intelligent ML decoder will hence have to iteratively update its decisions in the frequency domain 
based on the resulting waveforms in the time domain. Unfortunately, such a method will suffer from 
error propagation since a single faulty decision in frequency will generate a faulty estimate of c in time 
which will be used to update the frequency perturbations in the next iteration and so on. 

'D(fc) is a random variable with a PDF that is a function of 7, A^ 1 (k), az, and a compound distribution fc(h) which must 
be conditioned and then marginalized over the random support I c . We avoid presenting its derivation and justifying its proximity 
to a Gaussian in this paper due to lack of space, and directly treat it as a circularly symmetric variable with parameter <JD(k)- 
For the same reason, we also express functions compactly in terms of fn(k)(') by manipulating its argument only. 
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A direct countermeasure would be to refrain from using the tones at which the perturbations D{k) 
are large and hence unreliable |6]. Although this should eliminate false positives in the time domain, the 
economy in tone usage severely limits the improvement offered by such an approach. 

Alternatively, CS seems to be a very sensible solution to this problem. A partial observation of the 
frequency content of a sparse signal in the time domain is sufficient to recover c and hence C in one 
shot. This would certainly get around the problem of unreliable perturbations as CS algorithms can be 
totally blind to them and still offer near optimal signal reconstruction under mild conditions. 

Fortunately, unlike our previous approach |2] of reserving a sufficient number of tones at the transmitter 
to recover c, and consequently reducing the transmission rate, we do not require any tone reservation in 
this method, and are completely free to choose any subset O m from the N data-carrying tones in order 
to reconstruct c at the receiver. This freedom of choice opens up many possibilities in how to select 
particular adaptive subsets to optimize the CS performance as will be thoroughly discussed later on. 

III. Development of Compressive Sensing Models with No Tone Reservation 

With the addition of C to the data vector X, we suspect that a part of the data samples X(k) will 
be severely perturbed to fall out of their corresponding decision regions Ax(k)- Denote by Qt = {k : 
(X(k) + C{k)) = X{k)} the subset of data tones in SI in which the perturbations are not severe (i.e. do 
not cause crossing a decision boundary). At these locations, the equality in (X(k)) = X(k) is true and 
hence Cq t = Xq t — (Xq t ) at the transmitter. More generally, 



where S^ T is an N x N diagonal and binary selection matrix, with | Sly | ones along its diagonal that 
extract the locations in the vector X — (X) according to the tone set fir while nulling the others, and 
Sq t is its complement such that Sq t Sq t = OnxN- Practically speaking, constitutes the bigger part 
of the general tone set O, with a probability of occupying at least 100a% of Q equal to Pr(|S7r| > 



aN) rj £)J2J a > {")P£(1 - P e ) N ~ £ for large constellations, where P e = 2Q (f^J. An essential part 

of OFDM signal recovery obviously constitutes finding this set, and correcting the distortion over fl? to 
finally reach $7^ = Q,. 

Upon demodulation and decoding at the receiver, we are left with an estimate X of the distorted data 
vector given in (0 along with its associated decoded vector (X) G X . Taking the difference yields 




(4) 




X-(X) 



X + D - (X + D) 



X + D-(Sn T X + Sa T E) 
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where Qt now indexes the locations where X(k) + D{k) remains within the correct ML decision region 
and E represents the error vector resulting from incorrect decoding decisions at VLt- Multiplying both 
sides by S^ T leaves us with 

Sn T (X-(X}) = Sn T X + Sn T D-S nT (Sn T X + Sn T E) 

= &n T X + Sn T D - Sn T X + Onxi 

= Sn T D 

= S^Fc + S^A- 1 ^ (5) 

where we have used the fact that S^ T = Sq t for any positive integer n, and redundantly used S^ T on 
E to show that Sq t E = Sq t Sq t E = Ojv x i- Note, however, that we do not require all of fix to recover 
c, for obviously there would be no need for any recovery algorithm if we knew Qt- Rather, we only 
require an arbitrary subset VL m C Q T C 17 of cardinality \ fl m \ < |^t| to correctly recover c by CS. As 
a result, we can replace the equation above with 

SnjX-(X)) = Sn m Fc + Sn m A- l Z 
= ^c + Z' 

where * = S^ m F, Z' = Sn m A~ l Z, and where we further let Y' = Sn m (X — (X)) denote the observation 
vector of the differences over the tones in Q m , nulled at the discarded measurements. This leads us to 
the lossless-rate CS model 

*n m =*n ra c + Zj, ra . (6) 

where Yq is the |fi m | -dimensional vector collecting the nonzero coefficients in Y'. Such a generic model 
can now be processed for c using any compressive sensing technique, be it convex programming, greedy 
pursuit, or iterative thresholding, and a very flexible region for tradeoff exists in regard to performance 
and complexity. In any case, our subsequent objective is to scrutinize the general conditioning of the 
model itself by supplying our most reliable observations to the generic CS algorithm. 

IV. Cherry Picking Vt m 

An essential question now is how one is to select among the (^) possible constructions of VL m . A 
general strategy of CS techniques is to select these m tones randomly for near-optimum performance. 
Although possible in this scenario, such a strategy neglects the fact that our observations vary in their 
credibility and attest to wether they represent true frequency-domain measurements of C or not since 
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Fig. 1. Variation of the reliability of observation X(k) — (X(k)) as the relative distances between it and the other constellation 
points changes with 0* w _ ( jfr w) . 



our assumption that X(k) — (X(k)} = D(k) is probabilistic. Furthermore, it neglects the fact that the 
estimation signal-to-noise-ratio E[|| 1 i r fi TO e||2]/lE[||i?Q |||] also varies with the channel gains {Ak(k)}ken 7n > 
and that knowledge of these gains has an effect on our reliability estimates ^| With the receiver risking 
faulty decisions, it must devise a procedure to select the most reliable set of observations in which to sense 
over. This could be done based on the relative posterior probability of D{k) equalling X(k) — (X(k)) 
to the probability of it equaling some other difference vector X(k) — Xi^i*. More precisely, let 

fMk) = log ^((X(k))=X(k)\X(k)) 
Pr((l(fc)} = X m {k)\X(k)) 
log Pr(Z?(£Q = l(fc) -(!(£;))) (?) 
Pt{D(k) = X(k) - X m (k)) 

define the reliability in decoding X(k) to the closest constellation point relative to decoding to the 
nearest neighbor Af NN (fc). The minimum certainty occurs at the boundary of the decision region and 
attains lK m i n (/c) = 0. At such tones, we would be highly skeptical of whether D{k) = X(k) — (X(k)) 
or D(k) = X(k) — X m (k), and would hence be supplying a plausibly false measurement to the CS 
algorithm. Instead, assume we only chose tones where \X(k) — (X(k)}\ were confined to a disk of radius 

2 We will refer to this ratio as the clipper-to-noise ratio (CNR) in order not to confuse it with the transmission model's SNR, 

E[||Ax||l]/E[|| z ||l]. 
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r. In such a case, the minimum reliability would increase to d\ m i n (k) = log j D ^^^- r ) m case °^ me 
nearest neighbor A?nn> and to *R(k) = log - — ^M~f^ V f° r me next nearest neighbor A'nnn measured 

jD(k) [V 2rf min — rj 

in the direction of a decision region's corner. The reliability of a measurement at each tone is then a 
function 9\(k) that maps a 3-tuple - ^ A~ X (A;)) into Rj". Fig. Q] illustrates 

this concept such that, for example, even though \X\(k) — {X(k))\ = \X 2 (k) — (X(k)}\, we have 

iKw-jxm y \x 2 (k)-(x(k))\ 

|ii(fc)-^ a | \x 2 (k)-x a \ 

and so the reliability of assuming D2(k) = X 2 {k) — (X(k)} is higher than the reliability of assuming 
D x {k) = lx(fe) - (X(k)), although fo^iX^k) - (X(k))) = f D{k) (X 2 (k) - (X(k))) by the circular 
symmetry assumption on D(k). Ultimately, we would choose our measurements according to the tones 
associated with the highest m reliability outputs, i.e. 

Q m = arg {^jvliliv-m+l • ( 8 ) 

Luckily, the locations of these tones are random and hence such a selection also preserves the near- 
optimality selection of tones for generic CS performance. 



A. Bayesian Reliability 

Using the reasoning based on the probability Pr((X(k)) = X(k)\X(k)), an exact expression for the 
reliability could be a direct generalization of (0, namely, 



f D{k) (X(k) - (X(k))) 



m) = — i\ : — (9) 

^minEg: 1 f D{k) (X(k) - Xi(k)) 

where the constant fH m i n is inserted to compensate for the rare worst case scenarios and preserve 9\(k) > 
0. For example, !H m i n = 1/3 would be sufficient for the case when X{k) falls on the center point between 
four constellation points. Unfortunately, this pursuit for exact reliability computation is inefficient. Even 
if we truncate the summation in (© to the nearest neighbors, the method would still require repeating 
redundant evaluations of fD(k){')- What is required is then a method that could approximate D\(k) based 
solely on the observation X(k) — (X(k)) with no reference to any other constellation point Xi. 

B. Practical Geometric -Based Reliability Computation 

The competitive constellation points can be accounted for by considering the magnitude and phase of 
our observation against the location of (X(k)} within the constellation plane. For example, an observation 
with (X(k)) being a midpoint in a large rectangular constellation will have a higher reliability if its 
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phase were along |^ + ^i, i = 0, 1, 2, 3}, compared to an observation with the same 

magnitude pointing in a different direction, which ultimately reaches a minimum reliability at phases 
* = 0, 1, 2, 3}. Therefore let 

s*-"(*) = f DW (*(*) - (x(k))) g (e Mh) _ {m ) do) 

define a reliability function which is computed based on the magnitude and phase of the respective k th 
coefficient alone. A general function which was found to very closely match the exact reliability outcome 
© for inner constellation points is 

9 (%)-<!(*)>) = ^Tp + cos ( 4 %)-<*«> + *) 

where a > (3 > 0. Furthermore, the aim is to also make g{-) magnitude dependent so that its profile sup- 
ported by [0, 2tt] will be increasingly tapered along + |i, i = 0, 1, 2, 3} relative to || + |i, % = 0, 1, 2, 3} 
as the magnitude |X — increases, compared to a fully isotropic profile at vanishingly small 

magnitudes. By linearly mapping a/ {a + j3) G [1/2, 1] to |X — G [0, d min ] we finally obtain 

H,fl ( 0A \ = V2d min -\X(k)-(X(k))\ 



9 vx{k)-mk))j ^2d n 

which is portrayed in Fig. [2] for different magnitudes. The last approximation we wish to mention is the 
simple magnitude-based function 

^■ l (k) = fD(k)(x(k)-(X(k))) (13) 

which is completely blind to the other constellation points. Nonetheless, for small a 2 D this approximation 
is very efficient, especially for inner points in large constellations. Once the type of function is set and the 
vector D\ is computed, we can directly select Q m from (H), fix our model ©, and proceed to recovering 
c by CS. 

To be sure, we used two different schemes of CS to recover c from the developed CS model in ©, 
one from the convex relaxation group and the other from greedy pursuit methods. More specifically, the 
first is a weighted and phase-augmented LASSO (9l we refer to as WPAL O, which is a data aided 
modification of the standard LASSO that incorporates data in the time domain to improve distortion 
recovery, and can be defined as 

c = arg c min|||F H X - 7| T c||i s.t. \\Yq - *n m c||2 < e (14) 
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270 



Fig. 2. Illustration of the phase penalty function <?' ( ^X(h)—(i(k))J ex P resse d i n GD- The function is normalized, and 
therefore the outer circle-shaped curves actually correspond to the smallest magnitudes, and become more tapered as \X(k) — 
(X(k)}\ increases. 

for some noise-dependent parameter e. The other technique is the Bayesian Matching Pursuit (BMP) by 
Schniter et al. [8] chosen for its superior performance and efficiency when a relatively large amount of 
measurements is available to it, a luxury we can actually enjoy in this work, unlike when pilot reservation 
is used to construct the observation vector Yq and an extreme economy in tones is enforced to preserve 
data rate El. 

V. Simulation Results 

The methods proposed in this paper were tested on an OFDM signal of 64 subcarriers drawn from a 
16-QAM constellation. The signal was subject to a block-fading, frequency-selective Rayleigh channel 
model with an SNR of 25 dB per bit, and a severe clipping level (defined as 101og7 2 /cr^) of 2 dB. 
No bit loading (i.e. no variation of constellation size per carrier SNR), diversity gain, or error control 
coding were considered. Special packages for convex programming 0, and greedy pursuit |8 ] were used 
to implement our CS algorithms. 

Fig. [3] shows the result of using WPAL ([T4l with the proposed reliability criteria in [TV] for choosing 
the measurement tone set $7 m . We plotted the results against an increased number of observed tones, 
such that, for instance, the most 10 reliable observations are used, compared to using the most 20 reliable 
observations, and so on. In doing so we expect a somewhat convex behavior of the SER as a function of 
| VL m | , since generally the more observations we use the better the performance of CS algorithms become 
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Fig. 3. SER vs. |fi m | for the various reliability functions defined in (|9j, d 1 Ob . and J 1 3 I t and their least reliable counterparts. 

(up to some typical saturation level), but then due to the increased amount of erroneous observations 
supplied as |fl m | increases, the performance eventually deteriorates. The simulation results confirm this 
intuition, and also confirm the relative performance of the three methods proposed in (|9]), (fTOl) . and (TT3T ). 
denoted by f2m yes , ^m' 9 , and Qm, respectively, as well as the reversed relative performance of the least 
reliable tone set of each, which we generically denote by arg {9^ : j\r}™i- 

Furthermore, using our practical reliability function (TTOb based on (fT2l ). we compared our results with 
what we consider the most popular nonlinear distortion mitigation techniques in the literature, namely, 
the Iterative ML Decoding (ItML) [4] and the Decision- Aided Reconstruction (DAR) [5] techniques. In 
addition, we also implemented the Quasi-ML technique in (6l which proposed improving the algorithm 
in H by refraining from making hard decisions when the absolute value of the real or imaginary part 
of the frequency deviation is larger than some linear function e of d m i n . Results in Fig. [4] show the 
superior performance of using BMP [8 ] over the set O^' , using only half the tones to reach the optimum 
performance. The WPAL performs significantly better than Zero Forcing (ZF), and can be used to improve 
the results of ItML, even though it performs less efficiently alone under most circumstances. Lastly, no 
gain is achieved by supplying the BMP estimate to ItML, as BMP alone normally outperforms this 
procedure. 
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VI. Conclusion 

A novel method has been proposed to use data-aided CS techniques over a reliable subset of ob- 
servations in the frequency domain in order to estimate and cancel sparse nonlinear distortion on an 
OFDM signal in the time domain. Moreover, a newly developed method of computing the reliability of 
each observation independently of the other M — 1 candidates within a constellation was also proposed 
and tested. The methods offer promising performance, and the authors are considering several possible 
improvements such as invoking soft decoding and CNR maximization. 
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